A Trace Cache Microarchitecture and Evaluation
نویسندگان
چکیده
As the instruction issue width of superscalar processors increases, instruction fetch bandwidth requirements will also increase. It will eventually become necessary to fetch multiple basic blocks per clock cycle. Conventional instruction caches hinder this effort because long instruction sequences are not always in contiguous cache locations. Trace caches overcome this limitation by caching traces of the dynamic instruction stream, so instructions that are otherwise noncontiguous appear contiguous. In this paper we present and evaluate a microarchitecture incorporating a trace cache. The microarchitecture provides high instruction fetch bandwidth with low latency by explicitly sequencing through the program at the higher level of traces, both in terms of (1) control flow prediction and (2) instruction supply. For the SPEC95 integer benchmarks, trace-level sequencing improves performance from 15% to 35% over an otherwise equally-sophisticated, but contiguous multipleblock fetch mechanism. Most of this performance improvement is due to the trace cache. However, for one benchmark whose performance is limited by branch mispredictions, the performance gain is due almost entirely to improved prediction accuracy.
منابع مشابه
Trace Cache Performance
Instruction fetch mechanism is a performance bottleneck of a Superscalar Processor. Fetch performance can be improved with the aid of an instruction memory known as a Trace Cache. This paper presents analytical expressions, which describe instruction fetch performance of a Trace Cache microarchitecture. The instruction fetch rates predicted by the expressions differ by seven percent from the si...
متن کاملTurboscalar: A High Frequency High IPC Microarchitecture
There is significant performance motivation to build larger and wider superscalar machines, however the implementation complexity can be overwhelming. When superscalar machines grow they necessarily become deeper in order to maintain frequency. As the pipeline depth increases the performance gained by a wide instruction fetch and dispatch is lost to branch misprediction penalty cycles. This wor...
متن کاملSummarizing multiprocessor program execution with versatile, microarchitecture-independent snapshots
Computer architects rely heavily on software simulation to evaluate, refine, and validate new designs before they are implemented. However, simulation time continues to increase as computers become more complex and multicore designs become more common. This thesis investigates software structures and algorithms for quickly simulating modern cache-coherent multiprocessors by amortizing the time ...
متن کاملA survey of new research directions in microprocessors
Current microprocessors utilise the instruction-level parallelism by a deep processor pipeline and the superscalar instruction issue technique. VLSI technology offers several solutions for aggressive exploitation of the instruction-level parallelism in future generations of microprocessors. Technological advances will replace the gate delay by on-chip wire delay as the main obstacle to increase...
متن کاملMicroarchitecture Characteristics and Implications of Alignment of Multiple Bioinformatics Sequences
With the growth of bioinformatics and computational biology industry, multiple sequence alignment (MSA) applications have become an important emerging workload. In spite of the large amount of recent attention given to the MSA software design, there has been little quantitative understanding of the performance of such applications on modern microprocessors and systems. In this paper, we analyze...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Trans. Computers
دوره 48 شماره
صفحات -
تاریخ انتشار 1999